Skip to content

[SPARK-57341][INFRA] Reconcile JIRA components with the PR title in merge script#56400

Closed
zhengruifeng wants to merge 3 commits into
apache:masterfrom
zhengruifeng:merge-jira-component-reconcile
Closed

[SPARK-57341][INFRA] Reconcile JIRA components with the PR title in merge script#56400
zhengruifeng wants to merge 3 commits into
apache:masterfrom
zhengruifeng:merge-jira-component-reconcile

Conversation

@zhengruifeng

@zhengruifeng zhengruifeng commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

What changes were proposed in this pull request?

This PR adds a component-reconciliation step to dev/merge_spark_pr.py. When a PR is merged, the script already normalizes the component tags in the PR title (e.g. [SQL], [CORE], [TEST]). This change maps every title tag that corresponds to a JIRA component -- primary or not, e.g. [SQL] -> "SQL" and [TEST] -> "Tests" -- and compares that set against the components on the linked JIRA ticket. On a mismatch it prompts the committer to:

  • overwrite the JIRA ticket's components with the PR title's,
  • append the PR title's components to the ticket, or
  • keep the JIRA ticket unchanged (the default).

Tags that do not correspond to a JIRA component ([MINOR], [FOLLOWUP], version tags like [4.X], unknown tags) are ignored. The JIRA issue summary printed during a merge now also lists the ticket's components.

Why are the changes needed?

The PR title and the JIRA ticket can drift out of sync on which components a change touches. Today the merge tool resolves the ticket without checking, so a committer has to notice and fix component mismatches by hand. Surfacing the difference at merge time, with a safe default of leaving JIRA untouched, makes it easy to keep the two consistent without forcing any change.

Does this PR introduce any user-facing change?

No. dev/merge_spark_pr.py is a committer-only tool.

How was this patch tested?

The script runs its doctests on startup via doctest.testmod(). A doctest covering the tag-to-JIRA-component mapping (including non-primary tags such as [TEST] -> "Tests") was added; the full suite passes. Formatting was verified with black 26.3.1 (the repo's pinned version) against the root pyproject.toml.

Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claude Code (Opus 4.8)

@zhengruifeng zhengruifeng changed the title [INFRA] Reconcile primary JIRA components with the PR title in merge_spark_pr.py [SPARK-57341][INFRA] Reconcile primary JIRA components with the PR title in merge_spark_pr.py Jun 9, 2026
@zhengruifeng zhengruifeng changed the title [SPARK-57341][INFRA] Reconcile primary JIRA components with the PR title in merge_spark_pr.py [INFRA] Reconcile JIRA components with the PR title in merge_spark_pr.py Jun 11, 2026
…spark_pr.py

When merging a PR, merge_spark_pr.py now compares the primary component tags in
the normalized PR title against the primary components on the linked JIRA ticket.
On a mismatch it prompts the committer to overwrite the JIRA's primary components
with the PR title's, append them, or keep JIRA unchanged (the default).

Non-primary tags such as [TEST] and non-primary JIRA components such as
"Optimizer" are ignored by the comparison and preserved by both updates, so a
common title like [SQL][TEST] no longer prompts against a SQL-only ticket. The
JIRA summary printed during a merge now also lists the ticket's components.

Generated-by: Claude Code (Opus 4.8)
… ones

Reconcile every PR-title tag that maps to a JIRA component, primary or not:
e.g. [TEST] -> "Tests" and [SHUFFLE] -> "Shuffle" are now handled alongside
primary tags like [SQL]. The full mapped set is compared against the ticket's
components, and the overwrite/append/keep prompt acts on that set.

This drops the earlier primary-only restriction, along with the now-unused
Component.find_by_jira_name helper and the primary_only flag on
jira_components_from_title_tags.

Generated-by: Claude Code (Opus 4.8)
@zhengruifeng zhengruifeng force-pushed the merge-jira-component-reconcile branch from 580f6e6 to 902e830 Compare June 11, 2026 13:01
@zhengruifeng zhengruifeng changed the title [INFRA] Reconcile JIRA components with the PR title in merge_spark_pr.py [INFRA] Reconcile JIRA components with the PR title in merge script Jun 11, 2026
@zhengruifeng zhengruifeng changed the title [INFRA] Reconcile JIRA components with the PR title in merge script [SPARK-57341][INFRA] Reconcile JIRA components with the PR title in merge script Jun 11, 2026
@zhengruifeng zhengruifeng marked this pull request as ready for review June 12, 2026 09:03
@zhengruifeng

zhengruifeng commented Jun 12, 2026

Copy link
Copy Markdown
Contributor Author

I used this script to merge #56450 [SPARK-57388][INFRA] Pin downstream actions/checkout to a single resolved SHA in maven_test.yml and python_hosted_runner_test.yml whose jira https://issues.apache.org/jira/browse/SPARK-57388 was incorrectly set to 'Spark Core', and then we can correct it by typing 'o'verwrite :

...


Check if the JIRA information is as expected (y/N): y
JIRA is unassigned, choose assignee
[0] Ruifeng Zheng (Reporter)
Enter number of user, or userid, to assign to (blank to leave unassigned): 0

================================================================================
PR title components differ from JIRA SPARK-57388:
  PR title: Project Infra
  JIRA:     Spark Core
================================================================================
[o]verwrite JIRA with PR title / [a]ppend PR title to JIRA / [k]eep JIRA as is (default: keep): o
Updated JIRA SPARK-57388 components to: Project Infra
Enter comma-separated fix version(s) [4.3.0]:
=== JIRA SPARK-57388 ===
Summary         Pin downstream actions/checkout to a single resolved SHA in maven_test.yml and python_hosted_runner_test.yml
Assignee        Ruifeng Zheng
Status          Resolved
Components      ['Project Infra']
Url             https://issues.apache.org/jira/browse/SPARK-57388
Affected        ['4.3.0']
Fixed           ['4.3.0']

Successfully resolved SPARK-57388 with fixVersions=['4.3.0']!

@zhengruifeng zhengruifeng requested review from HyukjinKwon, cloud-fan, dongjoon-hyun and yaooqinn and removed request for cloud-fan June 12, 2026 09:07

@cloud-fan cloud-fan left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

0 blocking, 0 non-blocking, 0 nits.
Clean, well-scoped committer-tooling change. The component-reconciliation step is consistent with existing merge-script conventions: it mirrors the choose_jira_assignee peer (prompt, update one JIRA field, non-fatal try/except) but only prompts on a mismatch and defaults to leaving JIRA untouched, so it is strictly less intrusive. title_components is threaded through every call site with a backward-compatible default, the new Components line in the summary printout has matching format placeholders, and the pure mapping helper is doctested (including the non-primary [TEST] -> "Tests" case). LGTM.

Comment thread dev/merge_spark_pr.py
"""
names = []
for tag in tags:
c = Component.find(tag)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm .. PySpark has to be "PYTHON" actually. and we have PS that is pandas API on Spark

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch -- [PYTHON] is the canonical tag. The registry already maps PYTHON -> "PySpark" (with PYSPARK as an alias) and keeps PS -> "Pandas API on Spark" as a separate component, so behavior was correct; only this doctest example used the non-preferred PYSPARK alias. Updated it to use PYTHON.

Comment thread dev/merge_spark_pr.py Outdated
return
if choice == "o":
new_names = list(title_jira_components)
else: # "a": append the PR title's components, keeping the existing ones first.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: if "a" is really what we're looking for here, then it may be better to just use explicit elif choice == "a" to document intent more tightly.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done -- switched the catch-all else to an explicit elif choice == "a". Safe since get_input only ever returns "o", "a", or "k", and "k" returns early.

…licit append branch

- Use the canonical PYTHON tag (not the PYSPARK alias) in the
  jira_components_from_title_tags doctest. The registry already maps
  PYTHON -> "PySpark" and keeps PS -> "Pandas API on Spark" separate;
  only the example used the non-preferred alias.
- Make the append branch explicit (elif choice == "a") instead of a
  catch-all else, since get_input only ever returns "o", "a", or "k".

Generated-by: Claude Code (Opus 4.8)
@zhengruifeng

Copy link
Copy Markdown
Contributor Author

thanks all, merged into master

@zhengruifeng zhengruifeng deleted the merge-jira-component-reconcile branch June 30, 2026 07:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants